{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## EfficientSAM Adapter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A Quick Overview\n",
    "\n",
    "<img width=\"880\" height=\"380\" src=\"../figs/EfficientSAM/EfficientSAM.png\">\n",
    "\n",
    "The huge computation cost of [SAM](https://github.com/facebookresearch/segment-anything) model has limited its applications to wider real-world applications. To address this limitation, [EfficientSAMs](https://github.com/yformer/EfficientSAM) provide lightweight SAM models that exhibits decent performance with largely reduced complexity. This is based on leveraging SAM-leveraged masked image pertraining (SAMI), which learns to reconstruct features from SAM image encoder for effective visual representation learning.\n",
    "\n",
    "Our goal is to integrate medical specific domain knowledge into the lightweight EfficientSAM model through adaptation technique. Therefore, we only utilize the pre-trained EfficientSAM weights without reperforming the SAMI process. Like our original [Medical SAM Adapter](https://arxiv.org/abs/2304.12620), we achieve efficient migration from the original SAM to medical images by adding simple Adapters in the network."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training\n",
    "We have unified the interfaces of EfficientSAM and SAM, and training EfficientSAM Adapter can be done using the same command as the SAM adapter:\n",
    "``python train.py -net efficient_sam -data_path data/isic -sam_ckpt checkpoint/efficient_sam/efficient_sam_vits.pt -image_size 1024 -vis 100 -mod sam_adpt``\n",
    "\n",
    "The pretrained weight of EfficientSAM can be downloaded here:\n",
    "| EfficientSAM-S | EfficientSAM-Ti |\n",
    "|------------------------------|------------------------------|\n",
    "| [Download](https://github.com/yformer/EfficientSAM/blob/main/weights/efficient_sam_vits.pt.zip) |[Download](https://github.com/yformer/EfficientSAM/blob/main/weights/efficient_sam_vitt.pt)|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance VS SAM \n",
    "**Setup**: Using a single Nvidia RTX 3090 GPU, the batch_size was set to 2. The resolution of input image is 1024.\n",
    "\n",
    "#### ISIC\n",
    "| Baseline     | Backbone  | DICE   | mIou | Memory  |\n",
    "| ------------ | --------- | ------ | ---- | ------- |\n",
    "| SAM          | VIT-Base  | 0.9225 | 0.8646 | 22427 M |\n",
    "| EfficientSAM | VIT-Small | 0.9091 | 0.8463 | 21275 M  |      \n",
    "| EfficientSAM | VIT-Tiny  | 0.9095 | 0.8437  |  15713 M  |\n",
    "\n",
    "#### REFUGE\n",
    "| Baseline     | Backbone  | DICE   | mIou | Memory  |\n",
    "| ------------ | --------- | ------ | ---- | ------- |\n",
    "| SAM          | VIT-Base  | 0.9085 | 0.8423 | 22427 M |\n",
    "| EfficientSAM | VIT-Small | 0.8691 | 0.7915 | 21275 M  |      \n",
    "| EfficientSAM | VIT-Tiny  | 0.7999 | 0.6949 |  15713 M  |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance Under different resolution \n",
    "**Setup**: Using a single Nvidia RTX 3090 GPU, the batch_size was set to 2. Using VIT-small as the backbone.\n",
    "\n",
    "#### ISIC\n",
    "| Resolution  | DICE   | mIou | Memory  |\n",
    "| --------- | ------ | ---- | ------- |\n",
    "| 1024  | 0.9091 | 0.8463 | 22427 M |\n",
    "| 512 | 0.9134 | 0.8495 | 21275 M  |      \n",
    "| 256  | 0.9080 | 0.8419  |  15713 M  |\n",
    "\n",
    "#### REFUGE\n",
    "| Resolution  | DICE   | mIou | Memory  |\n",
    "| --------- | ------ | ---- | ------- |\n",
    "| 1024  | 0.8691 | 0.7915 | 22427 M |\n",
    "| 512 | 0.8564 | 0.7673 | 21275 M  |      \n",
    "| 256  | 0.7432 | 0.6244  |  15713 M  |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The decreasing curve of loss and the performance curve.\n",
    "#### Backbone： VIT-Small\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-S (ISIC)_loss.png\" width=\"200\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-S (ISIC)_performance.png\" width=\"200\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "</p>\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-S (REFUGE)_loss.png\" width=\"200\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-S (REFUGE)_performance.png\" width=\"200\" /> \n",
    "</p>\n",
    "\n",
    "#### Backbone： VIT-Tiny\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-Ti (ISIC)_loss.png\" width=\"200\" />\n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-Ti (ISIC)_performance.png\" width=\"200\" /> \n",
    "</p>\n",
    "\n",
    "<p float=\"left\">\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-Ti (REFUGE)_loss.png\" width=\"200\" /> \n",
    "  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;\n",
    "  <img src=\"../figs/EfficientSAM/EfficientSAM-Ti (REFUGE)_performance.png\" width=\"200\" /> \n",
    "</p>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.16 ('general')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.7.16"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "e7f99538a81e8449c1b1a4a7141984025c678b5d9c33981aa2a3c129d8e1c90d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
